Goto

Collaborating Authors

 important issue


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

We thank all the reviewers for taking the time to read and comment on our work. We will use the comments to improve the paper. Below we comment on some specific issues that were raised. R1 These are good points regarding the experiments, we will update the plots following these suggestions. Note that uniform and Lipschitz are the same in some plots because the rows of the data are normalized (Lipschitz can still give improvements here because it depends on the potentially-smaller Lipschitz constant of the deterministic part.)


Prompt Stability Scoring for Text Annotation with Large Language Models

Barrie, Christopher, Palaiologou, Elli, Törnberg, Petter

arXiv.org Artificial Intelligence

Researchers are increasingly using language models (LMs) for text annotation. These approaches rely only on a prompt telling the model to return a given output according to a set of instructions. The reproducibility of LM outputs may nonetheless be vulnerable to small changes in the prompt design. This calls into question the replicability of classification routines. To tackle this problem, researchers have typically tested a variety of semantically similar prompts to determine what we call "prompt stability." These approaches remain ad-hoc and task specific. In this article, we propose a general framework for diagnosing prompt stability by adapting traditional approaches to intra- and inter-coder reliability scoring. We call the resulting metric the Prompt Stability Score (PSS) and provide a Python package PromptStability for its estimation. Using six different datasets and twelve outcomes, we classify >150k rows of data to: a) diagnose when prompt stability is low; and b) demonstrate the functionality of the package. We conclude by providing best practice recommendations for applied researchers.


Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Huang, Kung-Hsiang, Zhou, Mingyang, Chan, Hou Pong, Fung, Yi R., Wang, Zhenhailong, Zhang, Lingyu, Chang, Shih-Fu, Ji, Heng

arXiv.org Artificial Intelligence

Recent advancements in large vision-language models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natural image captioning, the factuality of generated captions for structured document images, such as charts, has not received as much scrutiny, posing a potential threat to information reliability in critical applications. This work delves into the factuality aspect by introducing a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models, ultimately forming the foundation of a novel dataset, CHOCOLATE. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies. In response to this challenge, we establish the new task of Chart Caption Factual Error Correction and introduce CHARTVE, a model for visual entailment that outperforms proprietary and open-source LVLMs in evaluating factual consistency. Furthermore, we propose C2TFEC, an interpretable two-stage framework that excels at correcting factual errors. This work inaugurates a new domain in factual error correction for chart captions, presenting a novel evaluation mechanism, and demonstrating an effective approach to ensuring the factuality of generated chart captions.


Kamala Harris has an artificial intelligence problem

FOX News

The jokes seemed to write themselves last week after the Biden administration announced Vice President Kamala Harris, known for her vapid word salad speeches and obvious gaslighting, would now run point on artificial intelligence. Even I jumped in on the action, noting on FOX Business that Harris was more associated with the word "artificial" than the word "intelligence." All joking aside, the future of AI technology is a serious issue. With her approval ratings in the toilet and President Biden showing obvious signs of age-related decline, Kamala Harris (and by that I mean the Democratic Party) urgently needs a way to rehabilitate her historically unpopular image ahead of the 2024 presidential race. This is not the way. On this issue, like so many before it, Harris is out of her depth.


How to Prepare Your Dataset for Machine Learning and Analysis

#artificialintelligence

The bedrock of all machine learning models and data analyses is the right dataset. After all, as the well known adage goes: "Garbage in, garbage out"! However, how do you prepare datasets for machine learning and analysis? How can you trust that your data will lead to robust conclusions and accurate predictions? The first consideration when preparing data is the kind of problem you're trying to solve.


Artificial Intelligence Creates New Cybersecurity Worries

#artificialintelligence

Artificial intelligence (AI) is revolutionizing the way we view cybersecurity. While there are many benefits to AI, it comes with a range of challenges that organizations must address. This article will discuss how AI is changing the way we approach cybersecurity. We'll also cover some of the ways in which artificial intelligence can improve our security posture, as well as some of its drawbacks. AI is changing the way we approach cybersecurity.


Will AI Short Circuit Cybersecurity? - AI Summary

#artificialintelligence

It is, to say the least, a very extensive report that raises important issues, but one can't help thinking that it might be self-serving in some cases, especially for the enormous tech companies that have already invested billions in AI and would like to control the degree of government intervention. That being said, it is well worth looking at the recommendations from the AI report and seeing whether or not they also apply to cybersecurity risk generally, as well as the cybersecurity, privacy, secrecy and safety risks of AI systems themselves. While the report is about AI, the recommendations apply equally well, if not more so, to cyberspace and cybersecurity risk. And then there is the cybersecurity of AI to consider as well as the use of AI in cybersecurity. It is, to say the least, a very extensive report that raises important issues, but one can't help thinking that it might be self-serving in some cases, especially for the enormous tech companies that have already invested billions in AI and would like to control the degree of government intervention.


The AI Act: getting the first step right

#artificialintelligence

Artificial Intelligence (AI) has been compared to electricity: it is a general-purpose technology with applications in all domains of human activity. Electricity has found uses that no one envisaged when the first electrical systems were designed and, in practice, life would be completely different without this technology. Ideally, the Act would have developed the two central ideas addressed by the White Paper: creating legislation that stimulates innovation, while at the same time guaranteeing trust. However, in its current form, the document has a few drawbacks and needs to mature to meet the expectations of the AI community, in particular, and of society, in general. The main sections of the Act are concerned with prohibited practices, high-risk systems, transparency requirements, and governance.


Predicting Hotel Cancellations with Machine Learning

#artificialintelligence

As you can imagine, the cancellation rate for bookings in the online booking industry is quite high. Once the reservation has been cancelled, there is almost nothing to be done. This creates discomfort for many institutions and creates a desire to take precautions. Therefore, predicting reservations that can be cancelled and preventing these cancellations will create a surplus value for the institutions. In this article, I will try to explain how future cancelled reservations can be predicted in advance by machine learning methods.


Are physicians worried about computers machine learning their jobs?

@machinelearnbot

The Journal of American Medical Association (JAMA) published a viewpoint titled "Unintended Consequences of Machine Learning in Medicine" [Cabitza2017JAMA]. The title is eye-catching, and it is an interesting read touching upon several important points of concern to those working at the cross roads of machine-learning (ML) and decision support systems (DSS). This viewpoint is timely, arriving at a time when others are also expressing concern about inflated expectations of machine learning and its fundamental limitations [Chen2017NEJM]. However, several points put forth as alarming in this piece are in my opinion unsupported. In this quick take, I hope to convince you that the reports of unintended consequences specifically due to ML have been greatly exaggerated.